Learning Near-Optimal Intrusion Responses Against Dynamic Attackers

نویسندگان

چکیده

We study automated intrusion response and formulate the interaction between an attacker a defender as optimal stopping game where attack defense strategies evolve through reinforcement learning self-play. The gametheoretic modeling enables us to find that are effective against dynamic attacker, i.e. adapts its strategy in strategy. Further, formulation allows prove best have threshold properties. To obtain nearoptimal strategies, we develop Threshold Fictitious Self-Play (T-FP), fictitious self-play algorithm learns Nash equilibria stochastic approximation. show T-FP outperforms state-of-the-art for our use case. experimental part of this investigation includes two systems: simulation system incrementally learned emulation statistics collected drive runs evaluated. argue approach can produce practical IT infrastructure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lazy Defenders Are Almost Optimal against Diligent Attackers

Most work building on the Stackelberg security games model assumes that the attacker can perfectly observe the defender’s randomized assignment of resources to targets. This assumption has been challenged by recent papers, which designed tailor-made algorithms that compute optimal defender strategies for security games with limited surveillance. We analytically demonstrate that in zero-sum secu...

متن کامل

Defending against Roadside Attackers

Communication between vehicles is a very promising technology to reduce fatalities and injuries in road traffic. Vehicles spontaneously form communication networks where they exchange messages to warn surrounding vehicles. Theoretically, this system is open to any communication node that is equipped with the required communication technology. This openness demands appropriate security mechanism...

متن کامل

Defending against multiple different attackers

One defender defends, and multiple heterogeneous attackers attack, an asset. Three scenarios are considered: the agents move simultaneously; the defender moves first; or the attackers move first. We show how the agents’ unit costs of defense and attack, their asset evaluations, and the number of attackers influence their investments, profits, and withdrawal decisions. Withdrawal does not occur ...

متن کامل

Near-optimal Regret Bounds for Reinforcement Learning Near-optimal Regret Bounds for Reinforcement Learning

This technical report is an extended version of [1]. For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s, s there is a policy which moves from s to s i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Network and Service Management

سال: 2023

ISSN: ['2373-7379', '1932-4537']

DOI: https://doi.org/10.1109/tnsm.2023.3293413